In this assignment, you will perform fundamental analysis for the Toronto dwellings market to allow potential real estate investors to choose rental investment properties.
# imports
import warnings
warnings.filterwarnings('ignore')
import panel as pn
pn.extension('plotly')
import plotly.express as px
import pandas as pd
import hvplot.pandas
import matplotlib.pyplot as plt
import os
from pathlib import Path
from dotenv import load_dotenv
%matplotlib inline
#%matplotlib notebook
#%matplotlib widget
# Read the Mapbox API key
load_dotenv()
map_box_api = os.getenv("mapbox")
# Read the census data into a Pandas DataFrame
file_path = Path("Data/toronto_neighbourhoods_census_data.csv")
to_data = pd.read_csv(file_path, index_col="year")
to_data.head()
| neighbourhood | single_detached_house | apartment_five_storeys_plus | movable_dwelling | semi_detached_house | row_house | duplex | apartment_five_storeys_less | other_house | average_house_value | shelter_costs_owned | shelter_costs_rented | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| year | ||||||||||||
| 2001 | Agincourt North | 3715 | 1480 | 0 | 1055 | 1295 | 195 | 185 | 5 | 200388 | 810 | 870 |
| 2001 | Agincourt South-Malvern West | 3250 | 1835 | 0 | 545 | 455 | 105 | 425 | 0 | 203047 | 806 | 892 |
| 2001 | Alderwood | 3175 | 315 | 0 | 470 | 50 | 185 | 370 | 0 | 259998 | 817 | 924 |
| 2001 | Annex | 1060 | 6090 | 5 | 1980 | 605 | 275 | 3710 | 165 | 453850 | 1027 | 1378 |
| 2001 | Banbury-Don Mills | 3615 | 4465 | 0 | 240 | 380 | 15 | 1360 | 0 | 371864 | 1007 | 1163 |
In this section, you will calculate the number of dwelling types per year. Visualize the results using bar charts and the Pandas plot function.
Hint: Use the Pandas groupby function.
Optional challenge: Plot each bar chart in a different color.
# Calculate the sum number of dwelling types units per year (hint: use groupby)
# YOUR CODE HERE!
housing_units = to_data.groupby('year').sum().iloc[:,:8]
housing_units
| single_detached_house | apartment_five_storeys_plus | movable_dwelling | semi_detached_house | row_house | duplex | apartment_five_storeys_less | other_house | |
|---|---|---|---|---|---|---|---|---|
| year | ||||||||
| 2001 | 300930 | 355015 | 75 | 90995 | 52355 | 23785 | 116900 | 3040 |
| 2006 | 266860 | 379400 | 165 | 69430 | 54690 | 44095 | 162850 | 1335 |
| 2011 | 274940 | 429220 | 100 | 72480 | 60355 | 44750 | 163895 | 2165 |
| 2016 | 269680 | 493270 | 95 | 71200 | 61565 | 48585 | 165575 | 2845 |
# Save the dataframe as a csv file
# YOUR CODE HERE!
housing_units.to_csv('dwelling_type_file.csv')
# Helper create_bar_chart function
def create_bar_chart(data, title, xlabel, ylabel, color):
"""
Create a barplot based in the data argument.
"""
plt.figure(figsize=(5,3))
plt.bar(data.index, data.values, color =color)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.title(title)
plt.xticks(rotation=90)
plt.show()
# Create a bar chart per year to show the number of dwelling types
housing_units_T = housing_units.T
# Bar chart for 2001
# YOUR CODE HERE!
create_bar_chart(housing_units_T[2001], 'Dwelling types in Toronto in 2001', '2001', 'Dwelling type units', color='red')
# Bar chart for 2006
# YOUR CODE HERE!
create_bar_chart(housing_units_T[2006], 'Dwelling types in Toronto in 2006', '2006', 'Dwelling type units', color='blue')
# Bar chart for 2011
# YOUR CODE HERE!
create_bar_chart(housing_units_T[2011], 'Dwelling types in Toronto in 2011', '2011', 'Dwelling type units', color='yellow')
# Bar chart for 2016
# YOUR CODE HERE!
create_bar_chart(housing_units_T[2016], 'Dwelling types in Toronto in 2016', '2016', 'Dwelling type units', color='maroon')
In this section, you will calculate the average monthly shelter costs for owned and rented dwellings and the average house value for each year. Plot the results as a line chart.
Optional challenge: Plot each line chart in a different color.
# Calculate the average monthly shelter costs for owned and rented dwellings
# YOUR CODE HERE!
avg_own_rent_cost = to_data[['shelter_costs_owned','shelter_costs_rented']].groupby(['year']).mean()
avg_own_rent_cost
| shelter_costs_owned | shelter_costs_rented | |
|---|---|---|
| year | ||
| 2001 | 846.878571 | 1085.935714 |
| 2006 | 1316.800000 | 925.414286 |
| 2011 | 1448.214286 | 1019.792857 |
| 2016 | 1761.314286 | 1256.321429 |
# Helper create_line_chart function
def create_line_chart(data, title, xlabel, ylabel, color):
"""
Create a line chart based in the data argument.
"""
plt.plot(data.index,data.values,color=color)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.title(title)
plt.show()
# Create two line charts, one to plot the monthly shelter costs for owned dwelleing and other for rented dwellings per year
# Line chart for owned dwellings
# YOUR CODE HERE!
create_line_chart(avg_own_rent_cost['shelter_costs_owned'],
'Average Monthly Shelter Cost for Owned Dwellings in Toronto',
'Year', 'Avg Monthly Shelter Costs', color='blue')
# Line chart for rented dwellings
# YOUR CODE HERE!
create_line_chart(avg_own_rent_cost['shelter_costs_rented'],
'Average Monthly Shelter Cost for Rented Dwellings in Toronto',
'Year', 'Avg Monthly Shelter Costs', color='yellow')
In this section, you want to determine the average house value per year. An investor may want to understand better the sales price of the rental property over time. For example, a customer will want to know if they should expect an increase or decrease in the property value over time so they can determine how long to hold the rental property. You will visualize the average_house_value per year as a bar chart.
# Calculate the average house value per year
# YOUR CODE HERE!
avg_house_val_per_yr = to_data[['average_house_value']].groupby(['year']).mean()
avg_house_val_per_yr
| average_house_value | |
|---|---|
| year | |
| 2001 | 289882.885714 |
| 2006 | 424059.664286 |
| 2011 | 530424.721429 |
| 2016 | 664068.328571 |
# Plot the average house value per year as a line chart
# YOUR CODE HERE!
create_line_chart(avg_house_val_per_yr['average_house_value'],
'Average House Value in Toronto',
'Year', 'Avg House Value', color='blue')
In this section, you will use hvplot to create an interactive visualization of the average house value with a dropdown selector for the neighbourhood.
Hint: It will be easier to create a new DataFrame from grouping the data and calculating the mean house values for each year and neighbourhood.
# Create a new DataFrame with the mean house values by neighbourhood per year
# YOUR CODE HERE!
avg_house_val_by_neighbourhood = to_data[['neighbourhood','average_house_value']].groupby(['year','neighbourhood']).mean().reset_index()
avg_house_val_by_neighbourhood.head(10)
| year | neighbourhood | average_house_value | |
|---|---|---|---|
| 0 | 2001 | Agincourt North | 200388.0 |
| 1 | 2001 | Agincourt South-Malvern West | 203047.0 |
| 2 | 2001 | Alderwood | 259998.0 |
| 3 | 2001 | Annex | 453850.0 |
| 4 | 2001 | Banbury-Don Mills | 371864.0 |
| 5 | 2001 | Bathurst Manor | 304749.0 |
| 6 | 2001 | Bay Street Corridor | 257404.0 |
| 7 | 2001 | Bayview Village | 327644.0 |
| 8 | 2001 | Bayview Woods-Steeles | 343535.0 |
| 9 | 2001 | Bedford Park-Nortown | 565304.0 |
# Use hvplot to create an interactive line chart of the average house value per neighbourhood
# The plot should have a dropdown selector for the neighbourhood
# YOUR CODE HERE!
avg_house_val_by_neighbourhood.hvplot(x='year',y='average_house_value',kind='line',groupby='neighbourhood',dynspread=True).opts(framewise=True)
In this section, you will use hvplot to create an interactive visualization of the average number of dwelling types per year with a dropdown selector for the neighbourhood.
# Fetch the data of all dwelling types per year
# YOUR CODE HERE!
num_dwelling_type_by_neighbourhood = to_data.groupby(['year','neighbourhood']).sum().reset_index()
num_dwelling_type_by_neighbourhood.head(10)
| year | neighbourhood | single_detached_house | apartment_five_storeys_plus | movable_dwelling | semi_detached_house | row_house | duplex | apartment_five_storeys_less | other_house | average_house_value | shelter_costs_owned | shelter_costs_rented | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2001 | Agincourt North | 3715 | 1480 | 0 | 1055 | 1295 | 195 | 185 | 5 | 200388 | 810 | 870 |
| 1 | 2001 | Agincourt South-Malvern West | 3250 | 1835 | 0 | 545 | 455 | 105 | 425 | 0 | 203047 | 806 | 892 |
| 2 | 2001 | Alderwood | 3175 | 315 | 0 | 470 | 50 | 185 | 370 | 0 | 259998 | 817 | 924 |
| 3 | 2001 | Annex | 1060 | 6090 | 5 | 1980 | 605 | 275 | 3710 | 165 | 453850 | 1027 | 1378 |
| 4 | 2001 | Banbury-Don Mills | 3615 | 4465 | 0 | 240 | 380 | 15 | 1360 | 0 | 371864 | 1007 | 1163 |
| 5 | 2001 | Bathurst Manor | 2405 | 1550 | 0 | 130 | 130 | 375 | 745 | 0 | 304749 | 843 | 1052 |
| 6 | 2001 | Bay Street Corridor | 10 | 7575 | 0 | 0 | 15 | 0 | 240 | 0 | 257404 | 1218 | 1142 |
| 7 | 2001 | Bayview Village | 2170 | 630 | 0 | 170 | 765 | 15 | 640 | 0 | 327644 | 1197 | 1164 |
| 8 | 2001 | Bayview Woods-Steeles | 1650 | 1715 | 0 | 925 | 105 | 10 | 170 | 5 | 343535 | 1212 | 1018 |
| 9 | 2001 | Bedford Park-Nortown | 4985 | 2080 | 0 | 45 | 40 | 210 | 1235 | 15 | 565304 | 933 | 1491 |
# Use hvplot to create an interactive bar chart of the number of dwelling types per neighbourhood
# The plot should have a dropdown selector for the neighbourhood
# YOUR CODE HERE!
dwelling_type_col = ['single_detached_house',
'apartment_five_storeys_plus',
'movable_dwelling','semi_detached_house',
'row_house', 'duplex', 'apartment_five_storeys_less', 'other_house']
num_dwelling_type_by_neighbourhood.hvplot.bar(x='year',y=dwelling_type_col,
groupby='neighbourhood',
rot=90,xlabel='Year',ylabel='Dwelling Type Units',width=650,height=500).opts(framewise=True)
In this section, you will need to calculate the house value for each neighbourhood and then sort the values to obtain the top 10 most expensive neighbourhoods on average. Plot the results as a bar chart.
# Getting the data from the top 10 expensive neighbourhoods
# YOUR CODE HERE!
top_10_expensive_neighbourhoods = to_data.groupby(['neighbourhood']).mean().reset_index().sort_values(by=['average_house_value'],ascending=False).head(10).reset_index(drop=True)
top_10_expensive_neighbourhoods
| neighbourhood | single_detached_house | apartment_five_storeys_plus | movable_dwelling | semi_detached_house | row_house | duplex | apartment_five_storeys_less | other_house | average_house_value | shelter_costs_owned | shelter_costs_rented | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Bridle Path-Sunnybrook-York Mills | 2260.00 | 331.25 | 0.00 | 36.25 | 90.00 | 25.0 | 40.00 | 0.00 | 1526485.75 | 2360.75 | 2321.75 |
| 1 | Forest Hill South | 1742.50 | 2031.25 | 1.25 | 61.25 | 45.00 | 75.0 | 1027.50 | 3.75 | 1195992.50 | 1781.00 | 1313.75 |
| 2 | Lawrence Park South | 3472.50 | 773.75 | 0.00 | 126.25 | 38.75 | 225.0 | 966.25 | 16.25 | 1094027.75 | 1954.00 | 1372.75 |
| 3 | Rosedale-Moore Park | 2498.75 | 4641.25 | 0.00 | 486.25 | 245.00 | 327.5 | 1618.75 | 2.50 | 1093640.00 | 1909.75 | 1537.25 |
| 4 | St.Andrew-Windfields | 3225.00 | 1670.00 | 0.00 | 185.00 | 552.50 | 97.5 | 586.25 | 5.00 | 999107.00 | 1880.25 | 1384.50 |
| 5 | Casa Loma | 916.25 | 2310.00 | 0.00 | 288.75 | 201.25 | 162.5 | 1192.50 | 2.50 | 981064.25 | 1873.75 | 1547.75 |
| 6 | Bedford Park-Nortown | 4865.00 | 1981.25 | 0.00 | 43.75 | 57.50 | 287.5 | 1275.00 | 88.75 | 930415.25 | 1786.75 | 1255.00 |
| 7 | Forest Hill North | 1488.75 | 3392.50 | 0.00 | 12.50 | 16.25 | 82.5 | 402.50 | 1.25 | 851680.50 | 1722.75 | 1245.50 |
| 8 | Kingsway South | 2326.25 | 576.25 | 0.00 | 66.25 | 48.75 | 20.0 | 336.25 | 2.50 | 843234.25 | 1736.75 | 1622.00 |
| 9 | Yonge-St.Clair | 565.00 | 3948.75 | 0.00 | 425.00 | 212.50 | 172.5 | 1308.75 | 6.25 | 813220.25 | 1680.75 | 1369.00 |
# Plotting the data from the top 10 expensive neighbourhoods
# YOUR CODE HERE!
top_10_expensive_neighbourhoods.hvplot(x='neighbourhood',y='average_house_value',
xlabel='Neighbourhood',ylabel='Avg. House Value',title = 'Top 10 Expensive Neighbourhood in Toronto',
rot=90,kind='bar',width=650,height=500)
In this section, you will read in neighbourhoods location data and build an interactive map with the average house value per neighbourhood. Use a scatter_mapbox from Plotly express to create the visualization. Remember, you will need your Mapbox API key for this.
# Load neighbourhoods coordinates data
file_path = Path("Data/toronto_neighbourhoods_coordinates.csv")
df_neighbourhood_locations = pd.read_csv(file_path)
df_neighbourhood_locations.head()
| neighbourhood | lat | lon | |
|---|---|---|---|
| 0 | Agincourt North | 43.805441 | -79.266712 |
| 1 | Agincourt South-Malvern West | 43.788658 | -79.265612 |
| 2 | Alderwood | 43.604937 | -79.541611 |
| 3 | Annex | 43.671585 | -79.404001 |
| 4 | Banbury-Don Mills | 43.737657 | -79.349718 |
You will need to join the location data with the mean values per neighbourhood.
Calculate the mean values for each neighbourhood.
Join the average values with the neighbourhood locations.
# Calculate the mean values for each neighborhood
# YOUR CODE HERE!
avg_by_neighbourhood = to_data.groupby(['neighbourhood']).mean().reset_index()
avg_by_neighbourhood.head()
| neighbourhood | single_detached_house | apartment_five_storeys_plus | movable_dwelling | semi_detached_house | row_house | duplex | apartment_five_storeys_less | other_house | average_house_value | shelter_costs_owned | shelter_costs_rented | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Agincourt North | 3435.00 | 1947.50 | 2.50 | 863.75 | 1406.25 | 512.50 | 547.50 | 10.00 | 329811.5 | 1109.00 | 983.50 |
| 1 | Agincourt South-Malvern West | 2897.50 | 2180.00 | 1.25 | 375.00 | 456.25 | 523.75 | 628.75 | 32.50 | 334189.0 | 1131.25 | 985.00 |
| 2 | Alderwood | 2903.75 | 302.50 | 1.25 | 503.75 | 76.25 | 302.50 | 502.50 | 1.25 | 427922.5 | 1166.75 | 1003.25 |
| 3 | Annex | 751.25 | 7235.00 | 1.25 | 1375.00 | 613.75 | 355.00 | 4605.00 | 83.75 | 746977.0 | 1692.75 | 1315.25 |
| 4 | Banbury-Don Mills | 3572.50 | 5388.75 | 1.25 | 273.75 | 626.25 | 32.50 | 1340.00 | 0.00 | 612039.0 | 1463.50 | 1242.75 |
# Join the average values with the neighbourhood locations
# YOUR CODE HERE!
neighbourhood_df = pd.merge(df_neighbourhood_locations,avg_by_neighbourhood,on='neighbourhood',how='inner')
neighbourhood_df.head()
| neighbourhood | lat | lon | single_detached_house | apartment_five_storeys_plus | movable_dwelling | semi_detached_house | row_house | duplex | apartment_five_storeys_less | other_house | average_house_value | shelter_costs_owned | shelter_costs_rented | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Agincourt North | 43.805441 | -79.266712 | 3435.00 | 1947.50 | 2.50 | 863.75 | 1406.25 | 512.50 | 547.50 | 10.00 | 329811.5 | 1109.00 | 983.50 |
| 1 | Agincourt South-Malvern West | 43.788658 | -79.265612 | 2897.50 | 2180.00 | 1.25 | 375.00 | 456.25 | 523.75 | 628.75 | 32.50 | 334189.0 | 1131.25 | 985.00 |
| 2 | Alderwood | 43.604937 | -79.541611 | 2903.75 | 302.50 | 1.25 | 503.75 | 76.25 | 302.50 | 502.50 | 1.25 | 427922.5 | 1166.75 | 1003.25 |
| 3 | Annex | 43.671585 | -79.404001 | 751.25 | 7235.00 | 1.25 | 1375.00 | 613.75 | 355.00 | 4605.00 | 83.75 | 746977.0 | 1692.75 | 1315.25 |
| 4 | Banbury-Don Mills | 43.737657 | -79.349718 | 3572.50 | 5388.75 | 1.25 | 273.75 | 626.25 | 32.50 | 1340.00 | 0.00 | 612039.0 | 1463.50 | 1242.75 |
Plot the average values per neighbourhood using a Plotly express scatter_mapbox visualization.
# Create a scatter mapbox to analyze neighbourhood info
# YOUR CODE HERE!
#px.set_mapbox_access_token(open(".mapbox_token").read())
#api_token="pk.eyJ1Ijoic2lkZGhkb3NpIiwiYSI6ImNreGVudnpnNjA3NXcycG54bXF4MmRodzMifQ.8yVH47iUqi6nwLQR_LUB1g"
#px.set_mapbox_access_token(api_token)
fig = px.scatter_mapbox(neighbourhood_df, lat="lat", lon="lon", color="average_house_value", size="average_house_value",
color_continuous_scale=px.colors.cyclical.IceFire,height=400,width=800,title='Averange House Values in Toronto')
fig.show()
In this section, you will use Plotly express to a couple of plots that investors can interactively filter and explore various factors related to the house value of the Toronto's neighbourhoods.
to_data.head()
| neighbourhood | single_detached_house | apartment_five_storeys_plus | movable_dwelling | semi_detached_house | row_house | duplex | apartment_five_storeys_less | other_house | average_house_value | shelter_costs_owned | shelter_costs_rented | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| year | ||||||||||||
| 2001 | Agincourt North | 3715 | 1480 | 0 | 1055 | 1295 | 195 | 185 | 5 | 200388 | 810 | 870 |
| 2001 | Agincourt South-Malvern West | 3250 | 1835 | 0 | 545 | 455 | 105 | 425 | 0 | 203047 | 806 | 892 |
| 2001 | Alderwood | 3175 | 315 | 0 | 470 | 50 | 185 | 370 | 0 | 259998 | 817 | 924 |
| 2001 | Annex | 1060 | 6090 | 5 | 1980 | 605 | 275 | 3710 | 165 | 453850 | 1027 | 1378 |
| 2001 | Banbury-Don Mills | 3615 | 4465 | 0 | 240 | 380 | 15 | 1360 | 0 | 371864 | 1007 | 1163 |
# YOUR CODE HERE!
fig = px.bar(avg_house_val_by_neighbourhood, x="neighbourhood", y="average_house_value",
color='average_house_value', facet_row="year",
height=1000,width=900,
title='Avgrage House Value per Neighbourhood')
fig.show()
filter_df = to_data[to_data.neighbourhood.isin(top_10_expensive_neighbourhoods.neighbourhood)]
filter_df.groupby(['year','neighbourhood']).sum().reset_index().head()
| year | neighbourhood | single_detached_house | apartment_five_storeys_plus | movable_dwelling | semi_detached_house | row_house | duplex | apartment_five_storeys_less | other_house | average_house_value | shelter_costs_owned | shelter_costs_rented | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2001 | Bedford Park-Nortown | 4985 | 2080 | 0 | 45 | 40 | 210 | 1235 | 15 | 565304 | 933 | 1491 |
| 1 | 2001 | Bridle Path-Sunnybrook-York Mills | 2275 | 110 | 0 | 25 | 15 | 10 | 20 | 0 | 927466 | 1983 | 1790 |
| 2 | 2001 | Casa Loma | 1035 | 1700 | 0 | 415 | 190 | 185 | 1090 | 5 | 596077 | 1241 | 1500 |
| 3 | 2001 | Forest Hill North | 1565 | 3380 | 0 | 10 | 0 | 0 | 485 | 5 | 517466 | 940 | 1428 |
| 4 | 2001 | Forest Hill South | 1815 | 2440 | 5 | 65 | 45 | 85 | 1010 | 15 | 726664 | 1001 | 1469 |
# Fetch the data from all expensive neighbourhoods per year.
# YOUR CODE HERE!
filter_df = to_data[to_data.neighbourhood.isin(top_10_expensive_neighbourhoods.neighbourhood)]
most_expensive_neighbourhood_per_year = filter_df.groupby(['year','neighbourhood']).sum().reset_index()
most_expensive_neighbourhood_per_year.head()
| year | neighbourhood | single_detached_house | apartment_five_storeys_plus | movable_dwelling | semi_detached_house | row_house | duplex | apartment_five_storeys_less | other_house | average_house_value | shelter_costs_owned | shelter_costs_rented | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2001 | Bedford Park-Nortown | 4985 | 2080 | 0 | 45 | 40 | 210 | 1235 | 15 | 565304 | 933 | 1491 |
| 1 | 2001 | Bridle Path-Sunnybrook-York Mills | 2275 | 110 | 0 | 25 | 15 | 10 | 20 | 0 | 927466 | 1983 | 1790 |
| 2 | 2001 | Casa Loma | 1035 | 1700 | 0 | 415 | 190 | 185 | 1090 | 5 | 596077 | 1241 | 1500 |
| 3 | 2001 | Forest Hill North | 1565 | 3380 | 0 | 10 | 0 | 0 | 485 | 5 | 517466 | 940 | 1428 |
| 4 | 2001 | Forest Hill South | 1815 | 2440 | 5 | 65 | 45 | 85 | 1010 | 15 | 726664 | 1001 | 1469 |
# Create the sunburst chart
# YOUR CODE HERE!
import numpy as np
# fig = px.sunburst(most_expensive_neighbourhood_per_year, path=['year', 'neighbourhood'],
# values='shelter_costs_owned')
fig = px.sunburst(most_expensive_neighbourhood_per_year, path=['year', 'neighbourhood'], values='shelter_costs_owned',
color='shelter_costs_owned',
color_continuous_scale='RdBu',
color_continuous_midpoint=np.average(most_expensive_neighbourhood_per_year['shelter_costs_owned'], weights=most_expensive_neighbourhood_per_year['shelter_costs_owned'])
,height=600,width=800,title='Cost Analysis of Most Expensive Neighbourhoods in Toronto per Year')
fig.show()